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Inpainting: A powerful interpolation technique for helio- and astero- 
seismic data 
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In Helio- and asteroseismology, it is important to have continuous, uninterrupted, data sets. However, seismic observations 
usually contain gaps and we need to take them into account. In particular, if the gaps are not randomly distributed, they 
will produce a peak and a series of harmonics in the periodogram that will destroy the stellar information. An interpolation 
of the data can be good solution for this problem. In this paper we have studied an interpolation method based on the 
so-called 'inpainting' algorithms. To check the algorithm, we used both VIRGO and CoRoT satellite data to which we 
applied a realistic artificial window of a real CoRoT observing run to introduce gaps. Next we compared the results with 
the original, non-windowed data. Therefore, we were able to optimize the algorithm by minimizing the difference between 
the power spectrum density of the data with gaps and the complete time series. In general, we find that the power spectrum 
of the inpainted time series is very similar to the original, unperturbed one. Seismic inferences obtained after interpolating 
the data are the same as in the original case. 
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1 Introduction 

Helio- and asteroseismology are power tools to accurately 
determine the structure of the stellar interiors (e.g. Chris- 
tensen-Dalsgaard et al. 1996; Chaplin et al. 2008), their dy- 
namics (Thompson et al. 1996; Garcia et al. 2008) as well 
as global parameters as their masses, radius and ages (e.g. 
Stello et al. 2009). 

To do so, it is important to have continuous data without 
regular gaps that would introduce a series of spurious peaks 
in the power spectrum (e.g. Mosser et al. 2008). For in- 
stance, the time series obtained with the observations of the 
CoRoT (Convection, Rotation and planetary Transits) satel- 
lite (Michel et al. 2008) are periodically perturbed by high- 
energy particles hitting the satellite when it is crossing the 
South Atlantic Anomaly (SAA) (e.g. Auvergne et al. 2009). 
The presence of repetitive gaps, which come from this reg- 
ular perturbation, induces spurious peak in the power spec- 
trum. To reduce the influence of these non-desirable peaks, 
it is commonly used to interpolate the data. In some cases, a 
linear interpolation is sufficient to do so (e.g. Appourchaux 
et al. 2008; Benomar et al 2009; Garcia et al. 2009, De- 
heuvels et al. 2010) but in other cases a more sophisticated 



algorithm is necessary (e.g. Mosser et al. 2009). In this pa- 
per, we propose a different algorithm based on the so-called 
inpainting techniques (Elad et al. 2005; Pires et al. 2009) 
that seems to be especially suited for our purposes. All im- 
provements in the gap-filling data are of special importance 
for the analysis of CoRoT data but also for the forthcoming 
Kepler observations, for which very long time series (more 
than 3.5 years) are being expected for thousands different 
stars covering the HR diagram (e.g. Bedding et al. 2010; 
Chaplin et al. 2010; Stello et al. 2010). 

2 Inpainting algorithm 

Inpainting techniques are known in the field of image pro- 
cessing. The method that is used in this paper relies on the 
sparse representation of the data introduced by Elad et al. 
(2005). It assumes that there exists a dictionary <£> where the 
complete data are sparse and the incomplete data are less 
sparse. It means that there exists a representation a = <I> T X 
of the signal X in the dictionary <f> where most coefficients 
Qi are close to zero. 

The solution is obtained by minimizing the following 
equation: 
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whereby, X is the ideal complete time series, Y the ob- 
served time series and M the mask (i.e. Mi = 1 is a valid 
data point and Mi = elsewhere). Inpainting consists in 
recovering X knowing Y and M. In equation (03, o stands 
for the noise standard deviation and we use a pseudo norm 
with ||z||i = £\ Zi . 

In helio- and asteroseismology the best dictionary is ba- 
sed on Discrete Cosinus Transforms (DCT). In the case of 
CoRoT, we have built the mask M to remove those data 
points that were affected by the SAA crossing and also all 
the other points that were flagged in the datasets as bad 
points, according to the status flag (Auvergne et al. 2009). 
In Fig[T] we show a sample of a typical CoRoT observa- 
tion mask. The masked gaps in the CoRoT time series typ- 
ically have time scales less than 20 minutes and periodic 
patterns that originate from the orbital period of the satel- 
lite. About 10% of the data points are flagged as bad. Some- 
times, longer time gaps of the order of one hour occur in 
the CoRoT data. To treat the large variation of gap sizes, 
we used a wavelet decomposition and we determined the 
range of frequencies that we can interpolate in each gap by 
changing the blocksize of the local DCT for each wavelet 
plane. This corresponds to a Multi Scale Discrete Cosinus 
Transform (MSDCT). 



in Figf3] It is clear that the inpainting algorithm reduced the 
non-desirable sequence of peaks due to the gaps. 
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Fig. 1 Sample of the first 5 days of a typical CoRoT ob- 
servation mask corresponding to the LRc02 observations. 



3 Test of the inpainting algorithm 
3.1 Test with VIRGO data 

We tested our inpainting algorithm on the VIRGO/SOHO 
data (Frohlich et al. 1995) by applying the observational 
mask of CoRoT observations with a typical duty cycle of 
90%. With this test, we optimized the algorithm to minimize 
the difference between the power spectrum density (PSD) 
of the gapped time series compared to the original ones. We 
also applied a linear interpolation on the masked time series 
to check for improvement. 

FigE] shows the PSD of the original VIRGO data (red) 
and the masked time series (black). The PSD of the gapped 
time series has a peak at 161.7/iHz (orbital frequency) and 
a sequence of harmonics combined with daily aliases (± 
1 1.57 -k /iHz, where k is an interger). This pattern of spuri- 
ous peaks makes it difficult to find and identify the p-mode 
signature at 3 mHz. We then applied our inpainting algo- 
rithm to this gapped time series. The resulting PSD is shown 
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Fig. 2 PSD (in units of ppm 2 / //Hz) of the original (red) 
and masked (black) time series of VIRGO data. 




Fig. 3 PSD (in units of ppm 2 / //Hz) of the inpainted time 
series of VIRGO data. 



Examples of the inpainted (red solid line) and the lin- 
early interpolated (black solid line) time series are illus- 
trated in FigH] The inpainting method describes data in the 
gaps of the time series thanks to an extrapolation based on 
the frequency content as derived from DCT. This is a main 
feature of the inpainting algorithm. In other words, the in- 
painting algorithm tries to reconstruct the data inside the 
gaps from the available data. 

To check the amplitudes of the modes in the inpainted 
PSD, we have calculated the fractional difference in power 
between the original and the interpolated series for / = 
and I = 1 mode frequencies (see FigO. The amplitudes of 
the modes in the linearly interpolated series were underes- 
timated by 10% at the maximum of the p-mode hump and 
they depend on frequency. In the other hand, the inpainting 
retrieved amplitudes are roughly the same than in the orig- 
inal series but with some excess in power above 3.6 mHz 
that is currently under study. 
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Fig. 4 Sample of the time series with the original VIRGO 
data (dotted line), inpainted data (red solid line) and linear 
interpolated data (black solid line). 
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Fig. 5 Fractional difference in power between the original 
and the inpainted time series of VIRGO data for some I = 
(triangle) and I = 1 (plus) modes. Red symbols and black 
symbols correspond to inpainting and linear interpolation 
respectively. 



3.2 Test on CoRoT data 

We have also tested our inpainting algorithm on the initial 
CoRoT run in which the "HELREG" calibrated data (see 
for the details Auvergne et al. 2009) produces a PSD with 
only a few orbital harmonics. For this test, we chose two 
different type of pulsating stars: a solar-like star with solar- 
like oscillations and a 7 Doradus star. 

Solar-like star: We tested the inpainting technique on 
the CoRoT target HD 181420 (Barban et al. 2009). In this 
run, the standard linear interpolation algorithm removed ne- 
arly all the harmonics of the orbital period of 1 6 1 . 7 filiz and 
an optimized interpolation technique was not needed. Thus, 
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Fig. 6 Distribution of frequency differences between fre- 
quencies detected from the original (HELREG) and the in- 
painted time series. 



we multiplied this light curve by the observational window 
of the second run in the galactic center direction (LRc02) 
for which the linear interpolation could not clean up the re- 
sultant PSD (see Sect. 4), and we ran the same tests as in 
the case of the VIRGO data. The PSD of the inpainted light 
curve is less noisy than the masked, linearly interpolated 
one and the characteristics of the p-modes retrieved from 
the inpainted data are similar to the ones retrieved from the 
original data set within the error bars. 



7 Doradus star: We also tested the inpainting method 
on the CoRoT data of HD 49434 (Uytterhoeven et al. 2008; 
Chapellier et al. 2010) observed on the anti-center galactic 
direction. In this run, it was also not necessary to use an op- 
timized interpolation technique to obtain a PSD almost free 
of orbital perturbations. Thus, we have again multiplied this 
light curve by the observational window of the LRc02 run. 
Like in the solar-like star, the inpainted PSD is less noisy 
than the original with the LRc02 window. Then, to check 
if there was an influence on the asteroseismic results, we 
performed a frequency analysis on the original and the in- 
painted PSD. Fig|6]shows the distribution of frequency dif- 
ferences between frequencies detected in the original and 
the inpainted time series. The frequency resolution of the 
time series is w 0.0066c/d (~ 0.08/iHz). That means that 
most of the frequencies detected in the inpainted data coin- 
cide within ± one resolution bin. 



4 Applying the inpainting interpolation to 
stars in the LRc02 run 

The stars observed in the LRc02 CoRoT field suffer from 
a pollution in the PSD with a sequence of orbital harmon- 
ics plus some daily aliases around each orbital peak. We 
applied the inpainting algorithm on the problematic LRc02 
CoRoT time series of two main target : HD 170987 (Mathur 
et al. 2010a) and HD 171834 (Uytterhoeven et al. 2010). 
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Fig. 7 Sample of the raw (HELREG) (red) and inpainted 
(black) time series of HD 1 70987 . 




Frequency 

Fig. 8 PSD of the raw (HELREG) (top) and inpainting 
(bottom) time series of HD 170987. 
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Fig. 9 PSD of the raw (HELREG) (top) and inpainted 
(bottom) time series of HD 171834. 



FigLZl shows a sample of the raw and inpainted time 
series of HD 170987, a solar-like star. The top panel of 
Fig|8] shows the PSD of the raw time series of this star. 
The sequence of the orbital frequency and its daily aliases 
are clearly polluting the full spectrum. The bottom panel of 
Fig|8] shows the PSD of the inpainted time series. Only the 
first harmonics of the orbit are visible. 

Fig|9] shows the PSD of the raw (HELREG) and in- 
painted time series of HD 171834. Once again, the orbital 
harmonics and the daily aliases are significantly reduced af- 
ter inpainting improving the seismic analysis of this star. 



5 Conclusions 

We have shown that the inpainting based on MSDCT is a 
powerful interpolation algorithm which is well adapted to 
correct the data gaps in helio and asteroseismic observa- 
tions. We already applied it to the CoRoT data of the solar- 
like target HD 170987 (Mathur et al. 2010a). We are plan- 
ning to integrate it into our asteroseismic automatic pipeline 
for the analysis of Kepler data (Mathur et al. 2010b), and to 
use it to correct the GOLF velocity time series (Garcia et al. 
2005). 
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